Multilingual Projection for Parsing Truly Low-Resource Languages

نویسندگان

  • Zeljko Agic
  • Anders Johannsen
  • Barbara Plank
  • Héctor Martínez Alonso
  • Natalie Schluter
  • Anders Søgaard
چکیده

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Parser Selection for Low-Resource Languages

In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We c...

متن کامل

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages

In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also a...

متن کامل

Multilingual Structural Projection across Interlinear Text

This paper explores the potential for annotating and enriching data for low-density languages via the alignment and projection of syntactic structure from parsed data for resource-rich languages such as English. We seek to develop enriched resources for a large number of the world’s languages, most of which have no significant digital presence. We do this by tapping the body of Web-based lingui...

متن کامل

Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks

This work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our metho...

متن کامل

Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks

This paper describes the IIT Kharagpur dependency parsing system in CoNLL2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the lowresource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by a delexicalization method. We have applied transformation on the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TACL

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2016